Structure-inducing pre-training

نویسندگان

چکیده

Abstract Language model pre-training and the derived general-purpose methods have reshaped machine learning research. However, there remains considerable uncertainty regarding why improves performance of downstream tasks. This challenge is pronounced when using language in domains outside natural language. Here we investigate this problem by analysing how impose relational structure induced per-sample latent spaces—that is, what constraints do on distance or geometry between pre-trained embeddings samples. A comprehensive review reveals that question open, despite theoretical analyses showing importance understanding form structure. Based review, introduce a framework enables granular can be induced. We present analysis from first principles establish connection inductive bias fine-tuning performance. Empirical studies spanning three data modalities ten tasks confirm analyses, inform design novel consistent improvements over compelling suite methods.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pre-training Attention Mechanisms

Recurrent neural networks with differentiable attention mechanisms have had success in generative and classification tasks. We show that the classification performance of such models can be enhanced by guiding a randomly initialized model to attend to salient regions of the input in early training iterations. We further show that, if explicit heuristics for guidance are unavailable, a model tha...

متن کامل

Knowledge Transfer Pre-training

Pre-training is crucial for learning deep neural networks. Most of existing pre-training methods train simple models (e.g., restricted Boltzmann machines) and then stack them layer by layer to form the deep structure. This layerwise pre-training has found strong theoretical foundation and broad empirical support. However, it is not easy to employ such method to pre-train models without a clear ...

متن کامل

Inducing Structure for Vision and Language

The ability of children to solve complex learning problems during their first years of life has fascinated philosophers and researchers throughout time. While we are still far from completely understanding this process, there has been some interesting recent work in learning and language grounding (Regier, 2003; Roy, 2005; Yu et al., 2005). The computational models presented therein are capable...

متن کامل

Pre-Training CNNs Using Convolutional Autoencoders

Despite convolutional neural networks being the state of the art in almost all computer vision tasks, their training remains a difficult task. Unsupervised representation learning using a convolutional autoencoder can be used to initialize network weights and has been shown to improve test accuracy after training. We reproduce previous results using this approach and successfully apply it to th...

متن کامل

Pre-training of Hidden-Unit CRFs

In this paper, we apply the concept of pretraining to hidden-unit conditional random fields (HUCRFs) to enable learning on unlabeled data. We present a simple yet effective pre-training technique that learns to associate words with their clusters, which are obtained in an unsupervised manner. The learned parameters are then used to initialize the supervised learning process. We also propose a w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Nature Machine Intelligence

سال: 2023

ISSN: ['2522-5839']

DOI: https://doi.org/10.1038/s42256-023-00647-z